home *** CD-ROM | disk | FTP | other *** search
- B) PowerPC Support in C or C++
- ==============================
-
- Principially PPC Developpement in C/C++ runs in 5 phases:
-
- 1) Rewrite all 68k ASM Stuff in C
- 2) Adapt Source to ANSI/StormC
- 3) Adapt to PPC
- 4) Contextswitch-Optimizing
- 5) Further Adaptions
-
- Contrary to what you might believe, 3) is only a very small step,
- the big step is 2). And yes, you can do this already, even if you
- do not own a PPC, mainly. I will explain the different steps of
- developpement now in a more detailed way.
-
- It has to be outlined, that it is advised to do steps 1)/2) already
- while developping an 68k version, even if at first no PPC Version is
- planned. It will simplify the PPC Developpement much, and it in fact
- does not need too much extra work...
-
- It has also to be noted, that things are not that easy using the PPC
- Software from Phase 5. This is a special feature of the WarpOS Software,
- that things can be such easy.
-
- I won't discuss rewriting 68k ASM to C Source here, you should be able
- to do this yourselves.
-
- 2) Adapt Source to ANSI/StormC
- ------------------------------
-
- The most work is not the adaption to PPC, but the adaption from SAS/C or
- GNU C to StormC. StormC is a strict ANSI compiler, because of that it
- knows only Standard-C-Functions that are contained in the ANSI-Standard.
- Some of the not-supported functions can be emulated using the not-yet-released
- UnixLib, though.
-
- It should be noted, that, if your program compiles on SAS/C with the
- STRICT ANSI mode set. You can think of StormC as a compiler that ALWAYS
- runs in STRICT ANSI mode.
-
- The following SAS/C Functions are not contained in ANSI, and thus not
- supported by StormC (most of them are quite exotic functions, and it is
- possible that you do not even know a lot of them, even if you are a proficient
- C Coder) :
-
- astcsma isascii iscsym iscsymf toascii scdir stcpm
- stcpma stcsma stccpy stpcpy stcis stcisn stclen
- stpbrk stpchr stpchrn strcmpi strnset
- strset stcarg stpsym stptok stpblk strbpl strdup
- strins strmid stcd_i stcd_l ecvt fcvt gcvt
- stch_i stch_l stci_d stci_h stci_o stcl_d stcl_h
- stcl_o stco_i stco_l stcu_d stcul_d toascii stpdate
- stptime __datecvt __timecvt utpack utunpk cot iabs
- max min pow2 __emit getreg putreg geta
- isatty ovlyMgr dqsort fqsort lqsort sqsort strsrt
- tqsort drand48 erand48 jrand8 lcong48 lrand48 mrand8
- nrand48 seed48 srand48 __autoopenfail chkabort Chk_Abort
- _CXBRK __exit onexit _XCEXIT forkl forkv onbreak
- wait waitm bldmem rstmem sizmem chkml getmem
- getml halloc lsbrk sbrk _MemCleanup rbrk rlsmem
- rlsml memccpy movmem repmem setmem swmem except
- __matherr poserr datecmp timer __tzset getch fgetchar
- fputchar _dread _dwrite read write clrerr close
- _dclose fcloseall creat _dcreat _dcreatx fdopen fileno
- fmode iomode open _dopen flushall mkstemp mktemp
- setnbf _dseek lseek tell access chkufb chmod
- fstat getfa getft stat stcgfe stcgfn stcgfp
- strmfe strmfn strmfp strsfn unlink argopt chgclk
- dos_packet getclk getasn getdfs putenv rawcon stackavail
- stacksize stackused chdir closedir dfind dnext findpath
- getcd getcwd getfnl getpath mkdir opendir readdir
- rmdir seekdir rewinddir telldir readlocale scr_beep scr_bs
- scr_cdelete scr_cinsert scr_clear scr_cr scr_curs scr_cursrt scr_cursup
- scr_eol scl_home scr_ldelete scr_lf scr_linsert scr_tab _CXFERR
- _CXOVF _EPILOG _PROLOG
-
- The most important of the "not allowed" functions are the Level 0 I/O functions
- (open,close,read,write). Use fopen,fclose,fread,fwrite instead.
-
- Note: Some of these functions might be included, in the first version of this text
- i by mistake declared stricmp and strnicmp as not included (what is wrong), there
- might be more errors in the list :) But probably not many... probably none...
-
- But STRICT ANSI does not only limit the functions, there are also some things,
- that cause a warning from SAS/C, but an error from a strict ANSI Compiler.
-
- Things like:
-
- char *string=malloc(300);
-
- cause an error from StormC. Correct would be:
-
- char *string=(char *)malloc(300);
-
- ANSI wants STRONG TYPING. If you do not own StormC, but want to make your code
- as easy compilable with StormC PPC later, compile with STRICT ANSI. Problems
- appear especially with function pointers. If you are not sure how to cast
- a thing for STRICT ANSI, maybe you should try void *, it works often for
- not strongly typed source.
-
- You should also replace all K&R Syntax (example)
-
- void main(argv,argc)
- int argv;
- char **argc;
-
- by the normal syntax (example)
-
- void main(int argv,char **argc);
-
- Also a code like
-
- int a=5;
- int stuff[a];
-
- is not legal on ANSI. Array Dimensions have to be constants.
- If you need them variable, use dynamic allocation using malloc.
-
- A good method to convert to "Strict ANSI" is the following:
-
- 1. Just compile it, and look at every warning and error
- 2. Typecast everything that looks like a pointer (and causes
- an error) to void *, everything else that causes problems,
- to a int, long or double.
- 3. If some things still don't work, have a look at them now.
-
- Some Sources (like the Source of Doom) require parts of the
- Unix/TCP includes. If you need such things, please contact me,
- i have converted the needed things to StormC (contact address
- see below).
-
- Now we are nearly done with the ANSI/StormC Adaption. At the end
- some keyword have to be defined differently:
-
- #define __stdargs
- #define __regargs
- #define __asm
- #define __far FAR
- #define __inline inline
- #define __volatile volatile
-
- __chip, __fast and __interrupt do not exist on StormC, they have to
- be replaced by the appropriate OS Functions. Some programmers also
- use some strange cominations that won't work (static inline is complete
- nonsese, get it Unix-coders :) !!! Static OR inline but not both of them !!!)
-
- And if we are at "bad coding style": Bitfields only exist on C++, not
- in ANSI C...
-
- Ah, and one word to those fclose-always-works-fans. No, fclose does
- not work, if the file is NOT OPEN !!! You crash your task, if you try
- to close a file, that is not open.
-
- Do
-
- if (file) fclose(file);
-
- Some words to __attribute__ ((packed)). It does not exist, and is a
- feature that would slow down the PPC *much*, if it would exist. Please
- do not use __attribute ((packed)). The PPC needs a certain alignment
- to get optimal speed.
-
- About Text Constants longer than a line:
-
- It is legal to write:
-
- char *bla="...."\
- "...."\
- "....";
-
- But the last character before the \ should be a \ here.
-
- The notation
-
- char bla[]={"..."\
- "..."};
-
- is not legal (This is sometimes used in GNU C Sources).
-
- If you have done all this, you now (should) have a working StormC 68k Source.
- Now we go to the PPC stuff. The most work is done now. Only small things
- remain to do. PPC-handling is mostly done internal by the compiler.
-
- C. Adapt to PPC
- ---------------
-
- At first we have to change register parameters:
-
- void test(register __a0 mytest);
-
- has to be changed (for example) to
-
- void test(register mytest);
-
- The PPC does not know a register a0. But you can tell him to use a
- register by usage of the keyword "register", without specifying a
- register number.
-
- Next we have to do some changes to OS-Includes:
-
- up to now, depending on which compiler you used, you did (example):
-
- #include <clib/exec_protos.h>
- #include <pragmas/exec_pragmas.h>
-
- or
-
- #include <clib/exec_protos.h>
- #include <pragma/exec_lib.h>
-
- or
-
- #include <clib/exec_protos.h>
- #include <inline/exec.h>
-
- or
-
- #include <proto/exec.h>
-
- For StormC PPC you do:
-
- #include <clib/exec_protos.h>
-
- Do not include any pragmas/pragma files, or you will be swamped by error-messages.
- Also do not include any proto/ files.
-
- If you want to compile your source for both 68k and PPC (without changing the
- source) you do:
-
- #include <clib/exec_protos.h>
- #ifndef __PPC__
- #include <pragma/exec_lib.h>
- #endif
-
- __PPC__ is always set correctly.
-
- Yet another difference between 68k and PPC concerns the usage of Subtasks. If you
- want to do the Subtask as PPC Task (recommended) you have to replace functions like
- CreateTask() by CreateTaskPPC() of the powerpc.library. I won't go into detail here,
- most of the time the API is absolutely identic to the usual functions, with the
- exception of a PPC at the end of the function name. Read the documentation of
- WarpOS for more information.
-
- The other method would be doing the subtask as 68k task and calling CreateTask().
- To do so you would have to make your program a mixed Binary, though, and you also
- would not get full PPC Speedup. So usually (unless the subtask does many OS Calls)
- the CreateTaskPPC() approach is the better method. Also, it is recommended not to
- use 68k Subtasks in PPC programs, so that your program will get optimal speed
- on a 100% PPC Amiga System (that surely will appear some time in the future).
-
- Earlier versions of the compiler had problems with Tags-versions of OS-functions.
- This is fixed since quite some time now. I did not notice, that is why i said
- in earlier versions of this document, that you would have to change this code.
- I did not test since quite some time.
-
- Then we come to the BeginIO-Function. This function only exists with a
- Library Base on the PPC Compiler. You can use the following code (example
- is for audio.device):
-
- #include <libraries/powerpc.h>
- #include <ppcamiga.h>
-
- void BeginIOAudioPPC(struct IORequest *arg1)
- {
- extern struct Library *AudioBase;
- ULONG regs[16];
- regs[9] = (ULONG) arg1;
- __CallLibrary(AudioBase,-30,regs);
- }
-
- An example how this can be used (out of the Sound-Code of ZhaDoom...):
-
- AudioBase = (struct Library *)audio_io->ioa_Request.io_Device;
- c = &channel_info[cnum];
- c->audio_io->ioa_Request.io_Command = CMD_WRITE;
- c->audio_io->ioa_Request.io_Flags = ADIOF_PERVOL;
- c->audio_io->ioa_Data = &chip_cache_info[cache_chip_data (id)].chip_data[8];
- c->audio_io->ioa_Length = lengths[id] - 8;
- c->audio_io->ioa_Period = period_table[pitch];
- c->audio_io->ioa_Volume = vol << 2;
- c->audio_io->ioa_Cycles = 1;
- #ifdef __PPC__
- BeginIOAudioPPC((struct IORequest *)c->audio_io);
- #else
- BeginIO ((struct IORequest *)c->audio_io);
- #endif
-
- You see? You always have to read out the LibraryBase of a device to do
- a BeginIO on PPC...
-
- Some readers now probably ask themselves what about the famous
- "Context-Switch". Well, the truth is, under StormC, the Compiler
- automatically deals with the Contextswitch. You won't have to think
- about it... i will lose some words about it anyways:
-
- There are two sorts of Contextswitches:
-
- a) Function-Contextswitches
-
- You have to compile with Debugging-Information the first time you compile
- the Source. Then the compiler handles the Contextswitches automatically.
- Later you can compile without Debugging-Information, if you want.
-
- b) Library-Contextswitches
-
- These need so-called "function-stubs". ppcamiga.lib already contains
- the function-stubs for all Amiga-OS-functions, and for the 68k-functions
- of rtgmaster (But for rtgmaster also PPC-functions exist, and it is
- adviced to use these). To create a stub for a not yet supported library,
- you do:
-
- genppcstub mylib_protos.h mylib.fd VERBOSE
-
- You need the proto- and the FD-File to create the stub. The stub is a
- C Source file that you link together with your Source. The Contextswitch
- itselves then works automatically.
-
- D.) Contextswitch-Optimizing
- ----------------------------
-
- With WarpOS a Contextswitch needs about 0.5 milliseconds (with a 200 MHz
- PPC 604e Board...). It should be avoided to do "many Contextswitches
- per Second" (BTW: The Phase 5 Software needs about 1 millisecond for a
- Contextswitch).
-
- Example of things to avoid:
-
- - Load Files on a Byte-per-Byte basis with fgetc (use fread instead
- and load to a Fastram Buffer, from which you get the stuff on a
- Byte-Per-Byte-Basis then)
- - WritePixel (work on a Fastram-Buffer instead)
- - OS-Calls that are called often per second
-
- Graphics can be handled completely PPC Native by using rtgmaster.
- rtgmaster is a PPC Shared Library.
-
- Notice, that some of the Standard-C-Functions do Contextswitches.
- I think clock() is among them, but am not sure about it. A possibility
- to deal timing without Contextswitches for sure is to use the PPC
- timer directly, in PPC ASM:
-
- double tb_scale_lo = ((double)(bus_clock >> 2)) / 35.0;
- double tb_scale_hi = (4.294967296E9 / (double)(bus_clock >> 2)) * 35.0;
-
- bus_clock is set to the Bus Clock in Hz, for example 50000000 for
- a 150 MHz Board, 66000000 for a 200 MHz Board.
-
- Stopping time is then done like this (example of the I_GetTime-function
- of Doom):
-
- int I_GetTime (void)
- {
- unsigned int clock[2];
- double currtics;
- static double basetics=0.0;
- ppctimer (clock);
- if (basetics == 0.0)
- basetics = ((double) clock[0])*tb_scale_hi + ((double) clock[1])/tb_scale_lo;
- currtics = ((double) clock[0])*tb_scale_hi + ((double) clock[1])/tb_scale_lo;
- return (int) (currtics-basetics);
- }
-
-
- ppctimer looks like (object code for people who do not have StormPowerASM
- is contained inside this archive):
-
- vea
- XDEF _ppctimer
-
- _ppctimer: mftbu r4
- mftbl r5
- mftbu r6
- cmpw r4,r6
- bne _ppctimer
-
- stw r4,0(r3)
- stw r5,4(r3)
- blr
-
- But well, as i said, i am not sure, if clock() does use Contextswitches or not.
- Only i had the feeling that ZhaDoom speed up, after i replaced the usage of clock()
- by the usage of ppctimer().
-
- 5) Further Adaptions
- --------------------
-
- Note: The following is fully optional !!! (But it might speed up some things)
-
- It is possible to declare waste memory-areas as non-cachable using the BAT-registers
- of the PPC. How this is exactly done, read the documentation of WarpOS.
-
- Another optimization would be re-writing parts of the code in PPC Assembler.
- As to this, see below.
-
- In some newsgroups it was discussed to run program parts asynchronely on the
- 68k. Some people even claimed this would only be possible with the Phase 5
- software. This is not true, if you want to implement it, you would use the
- PPC-Native Message-System of WarpOS (keyword "AllocXMsg", refer to WarpOS
- documentation). But i want to outline the disadvantages of this "parallel"
- method:
-
- 1) On PPC-only machines such code would have serious disadvantages. And such
- systems will come...
- 2) The PowerUP-Hardware is not good for true Multi-Processoring. As soon as
- your 68k/PPC tasks share memory, you will get serious problems. I won't get
- into detail, it was discussed enough in the newsgroup. And it really is not
- worth the effort.
-
- I seriously recommend to work only "synchrone", doing Sub-Tasks only on the
- same CPU the mainprogram also is running on.
-
- Sometimes it is also useful to do a manual Contextswitch to a 68k ASM function.
- If the ASM functions contains tons of OS calls, for example. But if you have such
- code, i recommend using a Mixed Binary, anyways. Makes things more easy.
-
- PowerPC ASM Optimization
- ------------------------
-
- At last this one. Again i have to say, that it makes no sense to implement
- the whole stuff in PPC ASM. You start like this:
-
- 1) Implement all in C
- 2) Compile it for 68k and use the Profiler of StormC (the profiler currently
- only exists for 68k, but its data is also useful for PPC)
-
- When you use the profiler you run the program, and it does a statistic about
- which functions use how much CPU time. Then you implement the functions that
- take the most CPU time in PPC ASM. It is that simple.
-
- You have to keep in mind, though:
-
- - even ASM can't speedup massive numbers of Context-Switches
- - ASM also can't speed up the slow GFX Bus of the Amiga (Even Zorro III is
- slow as to today's standards...)
-
- Remember always:
-
- Doing a fast implementation in C and then using a Profiler to find out which
- functions are worth a ASM Optimization is much more clever than doing everything
- in PPC ASM.
-
- Of course the profiler is only available, if you own StormC. SAS/C and GNU C
- do not have a profiler.
-
- Now, what do you do, if your "original" source is in ASM, not in C ? Well,
- you insert timing checks and write some timing data to a file ("manual
- Profiling") at places where you think the most time is wasted. Of course,
- real profiling (using StormC) is much more easy. Also remember, that C
- defines it's functions like:
-
- _Functionname
-
- So if you want to profile ASM-Stuff you have to add a leading _ to all functionnames,
- and to XDEF them all.
-
- Example:
-
- stuff.asm
- ---------
-
- start:
-
- jsr morestuff
- ; lots of code
-
- rts
- ; lots of functions
- morestuff:
- ; lots of code
- rts
-
- Would have to be changed to:
-
- startit.c
- ---------
-
- extern void start(void);
-
- void main()
- {
- start();
- }
-
- stuff.asm
- ---------
-
- XDEF _start
- XDEF _morestuff
- ;... lots of functions
- _start:
- jsr _morestuff
- ;lots of code
-
- rts
- _morestuff:
- ;lots of code
- rts
-
- Well, and now you can start profiling... the C thing simply starts the ASM
- main function...
-
-